Conversation
ea90891 to
c0a5bdb
Compare
c0a5bdb to
78066a4
Compare
f160dd6 to
1c517e6
Compare
1c517e6 to
21f581a
Compare
52b711b to
713d0a2
Compare
pygmt/helpers/utils.py
Outdated
| return "geojson" | ||
|
|
||
| # A 2-D numpy.ndarray | ||
| if isinstance(data, np.ndarray) and data.ndim == 2: |
There was a problem hiding this comment.
We probably can change this line to:
| if isinstance(data, np.ndarray) and data.ndim == 2: | |
| if hasattr(data, "__array_interface__") and len(data.__array_interface__.shape) == 2: |
to support more array-like objects that implements the __array_interface__ protocol (https://numpy.org/doc/stable/reference/arrays.interface.html#object.__array_interface__), but it can be done in a future PR.
| case "file": | ||
| file_contexts.append(contextlib.nullcontext(track)) | ||
| case "matrix": | ||
| case "vectors": |
There was a problem hiding this comment.
pandas.DataFrame now is "vectors" kind.
pygmt/helpers/utils.py
Outdated
| case np.ndarray() if data.ndim == 2: # A 2-D numpy.ndarray object. | ||
| kind = "matrix" |
There was a problem hiding this comment.
Two things:
- Should we extend
kind="matrix"to 3-D numpy.ndarray objects? - I'm hesitant to match based on
np.ndarray()class only, because while it's not widely advertised, checking fordata is not Nonewould have caught other array types implementing the__array_function__protocol (see NEP18), but checking forisinstance(data, np.ndarray)would exclude those. That said, I'm wondering if the fallback tokind="vectors"might just work, but need to test this out more extensively.
There was a problem hiding this comment.
- Should we extend
kind="matrix"to 3-D numpy.ndarray objects?
It's unclear if we can pass a 3-D numpy array yet. Even if we can, it means more work, since in virtualfile_from_vectors/virtualfile_from_matrix, we explicitly check if the ndim=1 or 2. So, better to revisit the 3-D numpy array support in a separate PR if necessary.
- I'm hesitant to match based on
np.ndarray()class only, because while it's not widely advertised, checking fordata is not Nonewould have caught other array types implementing the__array_function__protocol (see NEP18), but checking forisinstance(data, np.ndarray)would exclude those. That said, I'm wondering if the fallback tokind="vectors"might just work, but need to test this out more extensively.
Yes, I was thinking about checking __array_interface__ in #3351 (comment).
There was a problem hiding this comment.
Done in 4a4f192, although other array-like objects are not tested.
| if data.dtype.kind not in "iuf": | ||
| _virtualfile_from = self.virtualfile_from_vectors |
There was a problem hiding this comment.
Missing test coverage for these lines?
There was a problem hiding this comment.
I can add a test for it. Before adding more tests, I'm wondering if we should split the big "test_clib_virtualfiles.py" file (with more than 500 lines) into separate smaller test files, i.e., one test file for each Session method.
test_clib_virtualfile_intest_clib_virtualfile_from_vectorstest_clib_virtualfile_from_matrixtest_clib_open_virtualfile
There was a problem hiding this comment.
We've split it before in #2784, so yes, ok to split it again 😆
There was a problem hiding this comment.
I've added a test in cfa32ed to cover this line.
Co-authored-by: Wei Ji <23487320+weiji14@users.noreply.github.com>
| z=data[:, 2], | ||
| data=data, | ||
| ) | ||
|
|
There was a problem hiding this comment.
This test is added to address #3351 (comment).
This new test passes a string dtype numpy array into GMT C API, which contains longitude/latitude strings. The data kind is "matrix", but since the data dtype is not in iuf, PyGMT will use virtualfile_from_vectors rather than virtualfile_from_matrix to pass the data into GMT C API. Ideally, we should check that the virtualfile_from_vectors is called once and virtualfile_from_matrix is not called, but I find it technically complicated with unittest.mock, so I'll leave it untested.
The function is marked as xfail with GMT 6.5 due to a newly found upstream bug at GenericMappingTools/gmt#8600.
There was a problem hiding this comment.
Good catch, and thanks for adding the test!
6b7a578 to
423e5dc
Compare
Need to wait for #3481.Description of proposed changes
As can be seen in the doctests added in #3480, currently,
matrixandvectorskinds are defined like below:vectorskind meansdata=Noneandrequired=True. It means that the input is given via a series of vectors (e.g., x/y/z).matrixrepresents any data types not recognized as arg/file/stringio/grid/image/geojson. The most common data types includepd.DataFrame,pd.Series,xr.Dataset,np.ndarray, and sequence of sequences.So,
matrixhere is an inclusive concept. However, in both Python and GMT,matrixusually means a homogenous 2-D array. When passing a "matrix" data to GMT, we usually treat the "matrix" as a series of vectors. The only exception is when the "matrix" data is a strict matrix (a 2-D numpy array) withdata.dtype.kind in "iuf".pygmt/pygmt/clib/session.py
Lines 1742 to 1755 in d7560fa
I think it makes more sense to use a more strict definition for
matrixand let all other data types bevectors. I propose to change the definitions of data kinds to:matrix: a 2-D homogenous numpy.ndarrayvectors: fallbacks to this kind for any unrecognized data types, including a dictionary,pd.DataFrame,pd.Series,xr.Dataset, nested lists/tuples, or any other unrecognized data types.The refactoring is mainly done in 808755d.
Of course,
data=None, required=Trueis still recognized as "vectors". I also think we should give this case another special kind like"none", which makes more sense and can also simplify the codes. [This is done in a separate PR at #3482, but I'm OK to merge #3482 into this PR first before merging this PR into main].Other data kinds are unchanged.
This is a breaking change since some previous "matrix" data are now recognized "vectors" and previously "vectors" is "none". But I guess very few users are using this function, so the breaking change should have minimal effects.